专利摘要:
METHOD OF OPERATING A DEVICE, DEVICE AND COMPUTER PROGRAM PRODUCT A method of operating a device is provided, the device comprising a plurality of audio sensors and being configured so that when a first audio sensor of the plurality of audio sensors is in contact with a user of the device, a second audio sensor of the plurality of audio sensors is in contact with air, the method comprising obtaining respective audio signals representing speech of a user of the plurality of audio sensors; and analyzing the respective audio signals to determine which, if any, of the plurality of audio sensors is in contact with the user of the device.
公开号:BR112013012539B1
申请号:R112013012539-0
申请日:2011-11-21
公开日:2021-05-18
发明作者:Patrick Kechichian;Wilhelmus Andreas Marinus Arnoldus Maria Van Den Dungen
申请人:Koninklijke Philips N.V.;
IPC主号:
专利说明:

TECHNICAL FIELD OF THE INVENTION
The invention relates to a device comprising a plurality of audio sensors as microphones and a method for operating the same, and in particular to a device configured so that when a first audio sensor of the plurality of audio sensors is in contact with a user of the device, a second sensor of the plurality of sensors is in contact with the air. HISTORY OF THE INVENTION
Mobile devices are often used in acoustically harsh environments (ie environments where there is a lot of background noise). In addition to issues with the user of the mobile device being able to hear the far end during two-way communication, it is difficult to get a 'clean' (ie, noise-free or substantially reduced noise) audio signal representing the speech of the user. In environments where the captured signal-to-noise ratio (SNR) is low, traditional speech processing algorithms can only perform a limited amount of noise suppression before the near-end speech signal (ie, obtained by the microphone at the end). mobile device) can become distorted with “music tones” accessories. It is known that audio signals obtained using a contact sensor such as a bone conductor (BC) or contact microphone (ie, a microphone in physical contact with the object producing the sound) are relatively immune to background noise. compared to audio signals obtained using an air-conducted (AC) sensor such as a microphone (ie, a microphone that is separated from the object producing sound by the air), since the variations of sound measured by the BC microphone propagated through the user's body rather than through the air as with a normal AC microphone, which, in addition to capturing the desired audio signal, still picks up background noise. Furthermore, the intensity of audio signals obtained using a BC microphone is generally much higher than that obtained using an AC microphone. As such, BC microphones were considered for use in devices that can be used in noisy environments. Fig. 1 shows that the BC signal is relatively immune to ambient noise when the AC signal and illustrates the high SNR properties of an audio signal obtained using a BC microphone relative to an audio signal obtained using an AC microphone in the same environment with noise. In figure 1 the vertical axis shows the amplitude of the audio signal.
However, a problem with speech obtained using a BC microphone is that its quality and intelligibility are generally much lower than speech obtained using an AC microphone. This reduction in intelligibility generally results from the filtering properties of bone and tissue, which can severely attenuate the high frequency components of the audio signal.
The quality and intelligibility of speech obtained using a BC microphone depends on its specific location in the user. The closer the microphone is placed close to the larynx __and vocal_chords . around- - - - the neck of the neck, the better the resulting quality and strength of the BC audio signal. Also, since the BC microphone is in physical contact with the object producing the sound, the resulting signal has a higher SNR compared to an AC audio signal that also picks up background noise.
However, although speech obtained using a BC microphone placed in or around the neck region will have a much higher intensity, the signal intelligibility will be very low, which is attributed to the filtering of the signal from the glottis through the bones and tissue. mole and around the neck region and the lack of vocal channel transfer function.
The characteristics of the audio signal obtained using a BC microphone also depend on the housing of the BC microphone, that is, it is protected from background noise in the environment, as well as the pressure applied to the BC microphone to establish contact with the user's body.
Thus, filtering or speech enhancement methods were developed with the aim of improving the intelligibility of speech obtained from a BC microphone, and these methods generally require both the presence of a clean speech reference signal to build a 15 equalization for applying to the audio signal from the BC microphone, or training user-specific models using a clean audio signal from an AC microphone. Alternative methods exist to improve speech intelligibility obtained from an AC microphone using properties of a speech signal from a BC microphone. SUMMARY OF THE INVENTION
Mobile Personal Emergency Response Systems (MPERS) include a user-worn pendant or similar device that includes a microphone to allow the user to contact an emergency care 'e' m" an emergency. As these devices can be used in noisy environments, it is desirable to provide a device that provides the best possible speech audio signal to the user, so the use of BC microphones and 30 AC microphones in these devices were considered.
However, a pendant is free to move with respect to the wearer (eg by rotation), so the specific microphone in contact with the wearer can change over time (ie a microphone can be a BC microphone at a time and one microphone_ AC on another) . It is also possible for none of the microphones to be in contact with the user at any given time (ie all 5 microphones are AC microphones). This causes problems for the subsequent circuit in device 2 which processes the audio signals to generate the enhanced audio signal, as specific processing operations are generally performed on particular (i.e. BC or AC) audio signals 10 .
Thus, there is a need for a device and method to operate the same to solve this problem.
According to a first aspect of the invention, there is provided a method for operating a device, the device 15 comprising a plurality of audio sensors and being configured so that when a first audio sensor of the plurality of audio sensors is in contact with in a user of the device, a second audio sensor of the plurality of audio sensors is in contact with air, the method comprising obtaining respective audio signals representing speech of a user of the plurality of audio sensors; and analyzing the respective audio signals to determine which, if any, of the plurality of audio sensors is in contact with the user of the device. Preferably, the analysis step comprises analyzing the spectral properties of each of the audio signals. Even more preferably, the analyzing step comprises analyzing the power of the respective audio signals 30 above a threshold frequency. It can be determined that an audio sensor is in contact with the device user if the strength of its respective audio signal above the threshold frequency is less than the strength of an audio signal above the threshold frequency of another audio sensor plus than a predetermined amount.
In a particular embodiment, the analysis step comprises applying a Fourier transform from point N to each audio signal; determining power spectrum information below a threshold frequency for each of the audio signals subjected to the Fourier transform; normalize the: audio signals subjected to "Fourier transform of the two sensors to each other according to the determined information; and compare the power spectrum above the threshold frequency of the normalized Fourier transform audio signals to determine which, if there, of the plurality of audio sensors is in contact with the device user.
In one implementation, the information determination step comprises determining the value of a maximum peak in the power spectrum below the threshold frequency for each of the audio signals submitted to the Fourier transform, but in an alternative implementation the determination step 20 of information comprises summing the power spectrum below the threshold frequency for each of the audio signals submitted to the Fourier transform,
It can be determined that an audio sensor is in contact with the user of the device if the power spectrum above the threshold frequency__for ■■ this--respective - ''' sir.al of audio subjected to Fourier transform is less than the power spectrum above the threshold frequency for an audio signal subjected to the Fourier transform of another audio sensor greater than a predetermined amount.
It can be determined that no audio sensor is in contact with the device user if the power spectra above the threshold frequency for audio signals subjected to the Fourier transform differ by less than a predetermined amount.
Preferably, the method further comprises a step of providing the audio signals to the circuit that processes the audio signals to produce an output audio signal representing user speech according to the result of an analysis step.
According to a second aspect of the invention, there is provided a device, comprising a plurality of 10 audio sensors arranged in the device so that when a first audio sensor of the plurality of audio sensors is in contact with a user of the device, a second audio sensor of the plurality of audio sensors is in contact with air; and circuitry that is configured to obtain respective audio signals representing speech from a user of the plurality of audio sensors; and for analyzing the respective audio signals to determine which, if any, of the plurality of audio sensors is in contact with the user of the device. Preferably, the circuit is configured to analyze the power of respective audio signals above a threshold frequency.
In a particular embodiment, the circuit is configured to analyze the respective audio signals 25 applying a Fovirier transform. -do point -N -■in~"cakra "audio signal; determining power spectrum information below a threshold frequency for each of the audio signals subjected to the Fourier transform; normalize the audio signals submitted to the 30 Fourier transform of the two sensors against each other according to the determined information; and comparing the power spectrum above the threshold frequency of the normalized Fourier transform audio signals to determine which, if any, of the plurality of audio sensors is in contact with the user of the device.
Preferably, the device further comprises processing the circuit to receive the audio signals and to process the audio signals in accordance with producing an output audio signal representing the user's speech.
According to a third aspect of the invention, there is provided a computer program product comprising computer readable code which is configured such that, upon execution of the computer readable code by a suitable computer or processor, the computer or processor performs the method described above. BRIEF DESCRIPTION OF THE DRAWINGS
Exemplary embodiments of the invention will now be described, by way of example only, with reference to the following drawings, in which:
Figure 1 illustrates the high SNR properties of an audio signal obtained using a BC microphone with respect to an audio signal obtained using an AC 20 microphone in the same noisy environment;
Figure 2 is a block diagram of a pendant including two microphones;
Figure 3 is a block diagram of a device according to a first embodiment of the invention; __ _ __ _ ------
Figures 4A and 4B are graphs showing a comparison between the spectral power densities between the signals obtained from a BC microphone and an AC microphone with or without total noise respectively; Figure 5 is a flowchart illustrating a method according to an embodiment of the invention;
Figure 6 is a flowchart illustrating a method according to a more specific embodiment of the invention;
Figure 7 is a graph showing the result of the action of a BC/AC discriminator module in a device according to the invention; and
Figure 8 is a block diagram of a device according to a second embodiment of the invention;
Figure 9 is a graph showing the result of speech detection performed on a signal obtained using a BC microphone;
Figure 10 is a graph showing the result of applying a speech enhancement algorithm to a signal obtained using an AC microphone;
Figure 11 is a graph showing a comparison between the signals obtained using an AC microphone in a clean and noisy environment and the output of the method according to the invention;
Figure 12 is a graph showing a comparison between the spectral power densities of these three signals shown in Figure 11; and
Figure 13 shows a handsfree kit with cable 20 for a mobile phone including two microphones. DETAILED DESCRIPTION OF PREFERRED ACHIEVEMENTS
With reference to figure 2, a device 2, in the form of a pendant, comprises two sensors 4, 6 arranged on opposite sides or surfaces of the pendant 2 so that when one of the two sensors 4, 6 is ... in account -to-- with'the user/ the 'other sensor is in contact with air. Sensor 4, 6 in contact with the user will act as a bone driven or contact sensor (and provide a BC audio signal) and sensor 4, 6 in contact with air will act as a 30 air driven sensor (and provides an AC audio signal) . Sensors 4, 6 are generally of the same type and configuration. In the illustrated embodiments, sensors 4, 6 are microphones, which can be based on MENS technology. Those skilled in the art will note that sensors 4, 6 can be implemented using other types of sensor or transducer. The device 2 can be attached to a binding wire so that it can be wrapped around a user's neck. The connecting wire and the device can be arranged so that the device, when used as a pendant, has a predetermined orientation with respect to the user's body to ensure that one of the sensors 4, 6 is in contact with the user. Yet the device can be formed so that 10 is an invariant rotation thus preventing that in use due to the user's movement the device orientation changes and the contact of said sensor with the user is lost. The shape of the device can, for example, be a rectangle.
A block diagram of a device 2 according to the invention is shown in figure 3. As described above, device 2 comprises two microphones: a first microphone 4 and a second microphone 6 which are positioned on the device 2 so that when one of the microphones 4, 6 is in contact with a part of the user, the other 20 microphone 4, 6 is in contact with the air.
The first microphone 4 and the second microphone 6 operate simultaneously (i.e. they capture the same speech at the same time) to produce their respective audio signals (identified as mi and m2 in Figure 3). * 25 The audio signals are proyid.os--to---a discriminating bit 7 that analyzes the audio signals to determine which, if any, corresponds to a BC audio signal and an AC audio signal. The discriminator block 7 then outputs the audio signals to circuit 8 which performs processing to improve speech quality in the audio signals.
The processing circuit 8 can perform any known speech enhancement algorithm on audio signal BC and audio signal AC to generate a clean (or at least enhanced) output audio signal representing user speech. The output audio signal is provided to transmitter circuit 10 for transmission through antenna 125 to another electronic device (such as a mobile phone or a base station of the device).
If the discriminator block 7 determines that no microphone 4, 6 is in contact with the user's body, then the discriminator block 7 can output the AC audio signals to the processing circuit 8, which then performs an alternative method of improving the speaks based on the presence of the various AC audio signals (eg beamforming). It is known that high speech frequencies in a BC audio signal are attenuated due to the transmission medium (eg frequencies above 1 kHz), which is demonstrated by the graphs in Figure 3 showing a comparison of spectral power densities of the BC and AC audio signals in the presence of diffuse white background noise (Figure 4A) and without total noise (Figure 4B). This property can then be used by the discriminator block 7 to differentiate between BC and AC audio signals.
An exemplary embodiment of a method according to the invention is shown in figure 5. In step 101, the respective audio signals are obtained simultaneously 25 using the first microphone 4 and...o. second, the discriminator block 6 and - the audio signals are provided to the discriminator block 7. Then, in steps 103 and 105, the discriminator block 7 analyzes the spectral properties of each of the audio signals, and detects which, if there, of the first and second microphones 4, 30 6 are in contact with the user's body based on spectral properties. In one embodiment, the discriminator block 7 analyzes the spectral properties of each of the audio signals above a threshold frequency ( for example, 1 kHz). However, a difficulty arises from the fact that two microphones 4, 6 may not be calibrated, ie the frequency response of the two microphones 4, 6 may be different 5. In this case, a calibration filter can be applied to one of the microphones before continuing with discriminator block 7 (not shown in the figures). frequency of the two 10 microphones have the same frequency. bad.
In the following operation, the discriminator block 7 compares the spectrum of the audio signals from the two microphones 4, 6 to determine which audio signal, if any, is an audio signal BC. If microphones 4, 6 have different frequency responses 15, this can be corrected with a calibration filter during the production of device 2 so that the different microphone responses do not affect the comparisons made by the discriminator block 7.
Even if this calibration filter is used, it is still necessary to explain that some gain differences between the AC and BC audio signals such as the strength of the AC and BC audio signals are different, in addition to their spectral characteristics (in particular the frequencies above 1 kHz) . 25 - -Thus, - -the -block -discriminator—7 -normalization~~spectrum' of the two audio signals above the threshold frequency (for discrimination purposes only) based on the global peaks found below the threshold frequency, and compares the spectrum above the threshold frequency to determine which, if 30, is a BC audio signal. If this normalization is not performed, then due to the high intensity of a BC audio signal, it can be determined that the power at the higher frequencies is even higher in the BC audio signal than in the AC audio signal, which would be the case. A particular embodiment of the invention is shown in the flowchart of figure 6. In the following, it is assumed that any calibration necessary to explain the differences in the frequency response of microphones 4, 6 has been carried out, and the respective audio signals are assumed to of the BC 4 microphone and the AC 6 microphone are time-aligned using time delays before still processing the audio signals described below. In step 111, the respective audio signals are obtained simultaneously using the first microphone 4 and the second microphone 6 and provided to the discriminating block 7.
In step 113, discriminator block 7 applies a fast Fourier transform (FFT) at point N (one-sided) to the audio signals from each microphone 4, 6 as follows:
produce the frequency bins N between © = 0 radians (rad) and co = 2πfs rad where f$ is the sampling frequency in Hertz (Hz) of the analog-to-digital converters that convert analog microphone signals to the digital domain. Separated from the first N/2+1 bins including the Nyquist frequency πfs, the remaining bins can be discarded. The discriminator block 7 then uses the FFT result on the audio signals to calculate the power spectrum of each audio signal.
Then, in step 115, the discriminating block 7 finds the maximum peak value of the power spectrum between the frequency bins below a threshold frequency ÚJC:
and uses the maximum peaks to normalize the power spectrum of audio signals above the threshold frequency oc. The threshold frequency wc is selected as a frequency above the spectrum of the BC audio signal is generally attenuated with respect to an AC audio signal. The threshold frequency wc can be, for example, 1 kHz. Each frequency bin contains a single value, which, for the power spectrum, is the squared magnitude of the frequency response in that bin, .
Alternatively, in step 115 the discriminator block 10 can find the summed power spectrum below coc for each audio signal, i.e.
and can normalize the power spectrum of the audio signals above the ac threshold frequency using the summed power spectrum.
As the low frequency bins of an AC audio signal and a BC audio signal must contain approximately the same low frequency information, the 20 values of pi and p2 are used to normalize the signal spectrum of the two microphones 4, 6 , so that the high frequency bins for both audio signals can be compared where is--expected-and-find''dis'c'repancies between a - BC audio signal and the AC audio signal) and an identified potential 25 BC audio signal. In step 117, the discriminator block 7 then compares the power between the signal spectrum from the first microphone 4 and the signal spectrum from the second microphone 6 normalized in the higher frequency Bin:
30 where e is a small constant to prevent division by zero, and pi/(p2+e) represents spectrum normalization of the second audio signal (although it will be noted that normalization could be applied to the first audio signal).
It is provided that the difference between the power of the two audio signals is greater than a predetermined amount (which depends on the location of the microphone .conductor per bone and' - can be determined experimentally), the audio signal with 10 the highest power in the normalized spectrum above wcθ determined as an audio signal from an AC microphone, and the audio signal with the lowest power is determined as an audio signal from a BC microphone.
However, if the difference between the power of the two audio signals is less than the predetermined amount, then it is not possible to positively determine that any of the audio signals is a BC audio signal (and it may be that neither microphone 4, 6 is in contact with the user's body). 20 It will be noted that instead of calculating the modulus squared in the equations above in step 117, it is possible to calculate the modulus values. It will also be noted that alternative comparisons between the strengths of the two signals can be made * 25 in step 117 using a limited_in.dic.and_..---so that the uncertainties can be explained in decision making. For example, a limited power index at frequencies above the threshold frequency can be determined:
30 with the index being limited between -1 and 1, with values close to 0 indicating uncertainty as to which microphone, if any, is a BC microphone. The discriminating block 7 includes a switching circuit which outputs the audio signal determined as an audio signal BC at an audio signal BC input of the processing circuit 8 and the audio signal determined as an audio signal AC at an input of the AC audio signal of the processing circuit 8. The processing circuit 8 then performs a speech enhancement algorithm on the BC audio signal and the AC audio signal to generate a clean-output audio (or at least improved) which represents the 10 user speech. If, due to uncertainty, both audio signals are determined as AC audio signals, the switching circuit in discriminating block 7 can output the signals to alternative audio signal inputs of processing circuit 8 (not shown in Fig. 3 ) . The processing circuit 8 can then treat both audio signals as AC audio signals and process them using conventional two-microphone techniques, for example, combining the AC audio signals using beamforming techniques. In an alternative embodiment, the switching circuit can form part of the processing circuit 8, which means that the discriminating block 7 can output the audio signal from the first microphone 4 to a first audio signal input of the processing circuit 8 and the audio signal from the second microphone 6 to a second audio signal input of the processing circuit 8, with a signal 13 indicating which, if any, of the audio signals is an audio signal BC or AC. The graph in Figure 7 illustrates the operation of the discriminator block 30 described above during a test procedure. In particular, during the first 10 seconds of testing, the second microphone 6 is in contact with a user (thus providing an audio signal BC) which is correctly identified by the discriminator block 7 (as shown in the lower graph). In the next 10 seconds of testing, the first microphone 4 is in contact with the user (then provides an audio signal BC) and this is again correctly identified 5 by the discriminator block 7.
Figure 8 shows an embodiment of the processing circuit 8 of a device 2 according to the invention in more detail. Device 2 generally corresponds to that shown in figure 3, with functions that are common to device 2 being identified with the same reference numerals.
Thus, in this embodiment, the processing circuit 8 comprises a speech detection block 14 which receives the audio signal BC from the discriminator block 7, a speech enhancement block 16 which receives the audio signal AC from the discriminator block 7 and the output of the speech detection block 14, a first function extraction block 18 which receives the audio signal BC and outputs a signal, a second function extraction block 20 which receives the output of the speech enhancement block 16 and an equalizer 22 which receives the signal from the first function extracting block 18 and the output from the second function extracting block 20 and produces the output audio signal from the processing circuit 8. The processing circuit 8 also includes another 25 circuit 24 for processing the first and second microphone 4, 6 audio signals when it is determined that both audio signals are AC audio signals. If used, the output of this circuit 24 is provided to the transmitter circuit 10 in place of the output audio signal from the equalizer block 22. In brief, the processing circuit 8 uses properties or functions of the audio signal BC and an improved algorithm. speaks to reduce the amount of noise in the AC audio signal, and then uses the noise-reduced AC audio signal to equalize the BC audio signal. The advantage of this particular audio signal processing method is that while the noise-reduced AC audio signal may still contain artifacts and/or noise, it can be used to improve the frequency characteristics of the BC audio signal (which generally does not contains speech artifacts) so that it sounds more intelligible. The speech detection block 14 processes the received BC audio signal to identify the parts of the BC audio signal that represent the speech by the user of the device 2. The use of the BC audio signal for speech detection is advantageous because of the BC 4 microphone's relative immunity to total noise and high SNR.
The speech detection block can perform speech detection by applying a simple threshold technique to the BC audio signal, where speech periods are detected when the amplitude of the BC audio signal is above a threshold value.
In other embodiments of processing circuit 208, it is possible to suppress noise in the BC audio signal based on minimum statistics and/or beamforming techniques (in case more than one BC audio signal is available) before speech detection to be fulfilled.
The graphs in Figure 9 show the result of the operation of the speech detection block__1.4-.em - uim-s-inat of"audio ...... - BC.— ' -----
The output of speech detection block 14 (shown at the bottom of figure 9) is provided to speech enhancement block 16 with the AC audio signal. Compared to the 30 BC audio signal, the AC audio signal contains total mobile or stationary noise sources, so speech enhancement is performed on the AC audio signal so that it can be used as a reference for further enhancement (equalization) of the BC audio signal. One effect of speech enhancement block 16 is to reduce the amount of noise in the AC audio signal.
Many different types of speech enhancement algorithms are known and can be applied to the AC audio signal by block 16, and the particular algorithm used may depend on the configuration of microphones 4, 6 on device 2, as well as on device 2 must be used.
In particular embodiments, the speech enhancement block 16 applies some form of spectral processing to the AC audio signal. For example, speech enhancement block 16 may use the output of speech detection block 14 to estimate noise thresholds in the spectral domain of the AC audio signal during non-speech periods as determined by the speech detection block. 14. Noise floor estimates are updated whenever speech is not detected.
In embodiments where device 2 is designed to have more than one AC sensor or microphone (ie, multiple AC sensors in addition to a sensor that is in contact with the user), the speech enhancement block 16 may also apply some microphone beamforming shape.
The upper graph in figure 10 shows the AC audio signal obtained from the AC microphone 6 and the lower graph in figure 10 shows the result of applying_jáo..._algor-i-tmo---de-improvement de_iala-ao-sdnál ”of AC audio using the output of speech detection block 14. It can be seen that the total noise level in the AC audio signal is sufficient to produce an SNR of approximately 0 dB and the 30 enhancement block of speech 16 applies a gain to the AC audio signal to suppress the total noise by almost 30 dB. However, it can also be seen that although the amount of noise in the AC audio signal has been significantly reduced, some artifacts remain. The noise-reduced AC audio signal is then used as a reference signal to increase the intelligibility (i.e., improvement) of the BC audio signal.
In some embodiments of processing circuit 58, it is possible to use long-term spectral modules to build an equalization filter, or alternatively, the BC audio signal can be used as an input to an adaptive filter that reduces the average error. squared between the filter output and the enhanced AC audio signal, with the filter output *10 providing an equalized BC audio signal. Yet another alternative makes use of the assumption that a finite impulse response can model the transfer function between the BC audio signal and the enhanced AC audio signal. Using an adaptive filter with the BC audio signal as an input 15 and the enhanced AC audio signal as a reference, the output of the adaptive filter is an equalized BC audio signal. In these embodiments, it will be noted that the equalizer block 22 requires the original BC audio signal in addition to the functions extracted from the BC audio signal by the function extract block 18. In this case, there will be an extra connection between the input line of the signal. BC audio and the equalizer block 22 in the processing circuit 8 shown in figure 8.
However, methods based on linear prediction may be better suited to improve speech intelligibility in an audio signal BQ, . thus, preferably the function extracting blocks 18, 20 are linear prediction blocks which extract the linear prediction coefficients from both the BC audio signal and the noise-reduced AC audio signal, which is used to build a filter. 30 equalization, as described further below.
Linear prediction (LP) is the speech analysis tool that is based on the filter model by speech production source, where the source and the filter correspond to the glottis excitation produced by the vocal cords and the shape of the vocal channel, respectively. The filter must be all-pole. Thus, the LP analysis provides an excitation signal and a frequency domain envelope represented by the all-pole 5 model that is related to the properties of the vocal channel during speech production. The model is given as
where y(n) and y(n - k) correspond to the samples of the 10 present and past signal under analysis, u(n) is the excitation signal with gain G, ak represents the predictive coefficients, ep order of the all-pole model .
The purpose of the LP analysis is to estimate the values of the predictive coefficients given to the '15 audio speech samples, to reduce the prediction error.
where the error actually corresponds to the excitation source in the source filter model. e(n) is the part of the signal that cannot be predicted by the model since this model can only predict the spectral envelope, and in reality it corresponds to the pulses generated by the glottis in the larynx (vocal cord excitation). It is known. . _ ,que.e._.. the- -white-noise ' additive severely performs the estimation of the LP coefficients, and that the presence of one or more additional sources in y(n) leads to the estimation of a sign of excitement that includes contributions from these sources. Therefore, it is important to acquire a noise-free audio signal that contains only the desired source signal in order to estimate the correct excitation signal. 30 The BC audio signal is such a signal. Because of its high SNR, the excitation source e can be correctly estimated using LP analysis performed by linear prediction block 18. This excitation signal e can then be filtered using the resulting all-pole model estimated by signal analysis AC audio reduced by 5 noise. Because the all-pole filter represents the leveled spectral envelope of the noise-reduced AC audio signal, it is more robust to artifacts resulting from the enhancement process.
As shown in Figure 8, linear prediction analysis is performed on both the BC audio signal (using linear prediction block 18) and the noise-reduced AC audio signal (using linear prediction block 20). Linear prediction is performed for each block of audio samples 32 ms in length with a 16 ms overlap. 15 A pre-emphasis filter can also be applied to one or both signals prior to linear prediction analysis. To improve the performance of linear prediction analysis and subsequent equalization of the BC audio signal, the noise-reduced AC audio signal and the BC signal can first be time-aligned (not shown) by introducing an appropriate time delay into any signal. of audio. This time delay can be adaptively determined using correlated techniques.
During the current sample block, the 25 past,-, present-and-nt-and-future predictor coefficients are estimated, converted to line spectral frequencies (LSFs), flattened, and converted back to linear predictor coefficients. LSFs are used since the representation of the linear prediction coefficient of the spectral envelope is not responsible for the smoothing. Smoothing is applied to smooth out transitional effects during the synthesis operation.
The LP coefficients obtained for the BC audio signal are used to produce the BC excitation signal e. This signal is then filtered (equalized) by the EQ block 22 which simply uses the estimated and leveled all-pole filter of the noise-reduced AC audio signal.

Other formation using the LSFs of the all-pole filter can be applied to the all-pole AC filter to prevent unnecessary boosts in the effective spectrum.
If a pre-emphasis filter is applied to the signals before LP analysis, a pre-emphasis filter can be applied to the output of H(z). A wideband gain can also be applied to the output to compensate for the wideband amplification or attenuation resulting from the emphasis filters.
Thus, the output audio signal is derived by filtering a 'clean' excitation signal and obtained from an LP analysis of the BC audio signal using an estimated all-pole model of the LP analysis of the noise-reduced AC audio signal. .
Figure 11 shows a comparison between the AC signal from the microphone in a clean and noisy environment and the output of the processing circuit 8 when linear prediction is used. Thus, it can be seen that the audio signal . of _ ..output -contains-considerably less artifacts than noisy AC audio signal 25 and more closely sounds like clean AC audio signal.
Figure 12 shows a comparison between the spectral power densities of the three signals shown in Figure 11. Yet here it can be seen that the spectrum of the output audio signal 30 is more closely compatible with the AC audio signal in an environment. clean.
Thus, this realization of the processing circuit 8 allows a clean (or at least intelligible) speech audio signal to be produced in a poor acoustic environment where speech is degraded by low noise or reverberation.
In another embodiment of the processing circuit 8 (not shown in Fig. 8), a second speech enhancement block is provided to improve (reduce the noise) in the audio signal BC provided by the discriminator block 7 before performing the linear prediction. As with the first speech enhancement block 16, the second speech enhancement block receives the output of the speech detection block 14. The second speech enhancement block is used to apply moderate speech enhancement to the audio signal. BC to remove any noise that leaks into the microphone signal. Although the algorithms performed by the first and second speech enhancement blocks may be the same, the actual amount of noise suppression/speech enhancement applied will be different for the AC and BC audio signals.
It will be appreciated that the pendant 2 shown in Figure 2 or another device other than a pendant embodying the invention described above may include more than two microphones. For example, the cross section of pendant 2 could be triangular (requiring three microphones, one on each surface) or square (requiring four microphones, one on each surface). It is even possible for a device 2 to be configured so that more _dp_ .than a -microphone- can- get a BC audio signal. In this case, it is possible to combine the audio signals from the various AC (or BC) microphones before the speech enhancement process by circuit 8 using, for example, beamforming techniques, to produce a 30 AC (or BC) audio signal with an improved SNR. This can help improve the quality and intelligibility of the audio signal emitted by the processing circuit 8.
When using more than one microphone of a particular type (eg AC and/or BC) in these devices, a general method for classifying microphones as both AC and BC by device can be described as follows. First, perform the classification of pairs as described in figure 5 or 6 between the microphones, and group them as either AC, BC or as uncertain. The next realization of the pair classification, this period between the microphones categorized as uncertain and BC signals. If two mics are still categorized as uncertain then they belong to group BC 10, otherwise they belong to mic group AC. The second step can also be performed using the AC group instead of the BC group.
Although the invention was described above in terms of a pendant that is part of MPERS, it will be noted that the invention can be implemented in other types of electronic device that use sensors or microphones to detect speech. One device type 2 is shown in Figure 13 which is a handsfree kit with cable that can be connected to a mobile phone to provide handsfree functionality. The device 2 comprises a headphone (not shown) and a microphone part 30 comprising two microphones 4, 6 which, in use, is placed close to the user's mouth or neck. The microphone part is configured so that either of the two microphones 4, 6 can be in contact with the user's neck 25, depending on. orientation- of the dd microphone part in any period.
It will be noted that the discriminator block 7 and/or the processing circuit 8 shown in figures 2 and 7 can be implemented as a single processor, or as several interconnected processing blocks. Alternatively, it will be appreciated that the functionality of processing circuit 8 can be implemented in the form of a computer program that is executed by a processor, or general purpose processors within a device. Furthermore, it will be noted that the processing circuit 8 can be implemented in a device separate from a device housing the first and/or the second microphone 4, 56, with the audio signals being passed between these devices.
It will also be noted that the discriminating block 7 and the processing circuit 8 can process the audio signals on a block-by-block basis (ie, process one block of audio samples at a time). For example, in discriminator block 7, audio signals can be divided into blocks of audio samples N before applying the FFT. The subsequent processing performed by the discriminating block 7 is then performed on each block of N 15 transformed audio samples. Function extract blocks 18, 20 can operate similarly.
Thus, a device and method for operating the same is provided that allows an audio signal representative of a user's speech to be obtained from the BC and AC audio signals, even where the device is free to move with respect to the user, making with the microphone providing the BC and AC signals to change.
While the invention has been illustrated and described in detail in the drawings and description above, such illustration and description are to be considered illustrative-or-exemplary-and-not-festive; the invention is not limited to the disclosed embodiments.
Variations in the disclosed embodiments may be understood and made by those skilled in the art in the practice of the claimed invention, from a study of the drawings, the disclosure and the appended claims. In the claims, the word "comprising" does not exclude other elements or steps, and the indefinite article "a" or "an" does not exclude a plurality. A single processor or other unit can perform the functions of the various items recited in the claims. The mere fact that certain measurements are recited in mutually different dependent claims 5 does not indicate that a combination of these measurements cannot be used to advantage. A computer program can be stored/distributed in a suitable medium, such as an optical storage medium or a solid state medium •supplied together or as part of other hardware, but it can also be distributed in other ways, such as via the Internet or other wired or wireless telecommunications systems. Any reference signs in the claims should not be construed as limiting the scope.
权利要求:
Claims (14)
[0001]
1. METHOD FOR OPERATING A DEVICE, the device being characterized by comprising a plurality of audio sensors and being configured so that when a first audio sensor of the plurality of audio sensors is in contact with a user of the device, a second sensor audio from the plurality of audio sensors is in contact with air, the method comprising: obtaining respective audio signals representing speech of a user from a plurality of audio sensors (101); and analyzing the respective audio signals to determine which, if any one of the plurality of audio sensors is in contact with the device user (103, 105).
[0002]
2. METHOD, according to claim 1, characterized in that the analysis step (103, 105) comprises the analysis of the spectral properties of each of the audio signals.
[0003]
3. METHOD, according to claim 1 or 2, characterized in that the analysis step (103, 105) comprises the analysis of the power of the respective audio signals above a threshold frequency.
[0004]
4. METHOD according to claim 3, characterized in that it is determined that an audio sensor is in contact with the user of the device if the power of its respective audio signal above the threshold frequency is less than the power of a signal of audio above the threshold frequency of another audio sensor more than a predetermined amount.
[0005]
5. METHOD, according to any one of claims 1 to 4, characterized in that the analysis step (103, 105) comprises: application of a Fourier transform of the point N in each audio signal (113); determining power spectrum information below a threshold frequency for each of the audio signals subjected to the Fourier transform (113); normalization of the audio signals submitted to the Fourier transform of the two sensors according to the determined information (115); and comparing the power spectrum above the threshold frequency of the normalized Fourier transform audio signals to determine which, if any, of the plurality of audio sensors is in contact with the device user (117).
[0006]
6. METHOD, according to claim 5, characterized in that the step of determining the information comprises determining the value of a maximum peak in the power spectrum below the threshold frequency for each of the audio signals submitted to the Fourier transform (115) .
[0007]
7. METHOD, according to claim 5, characterized in that the step of determining the information comprises the sum of the power spectrum below the threshold frequency for each of the audio signals transformed by Fourier (115).
[0008]
8. METHOD according to any one of claims 5, 6 or 7, characterized in that it is determined that an audio sensor is in contact with the user of the device if the power spectrum is above the threshold frequency for its respective submitted audio signal the Fourier transform is less than the power spectrum above the threshold frequency for an audio signal subjected to the Fourier transform of another audio sensor more than a predetermined amount.
[0009]
9. METHOD according to any one of claims 5, 6, 7 or 8, characterized in that it is determined that no audio sensor is in contact with the device user if the power spectra are above the threshold frequency for the audio signals Fourier transforms differ by less than a predetermined amount.
[0010]
10. METHOD according to any one of claims 1 to 9, characterized in that it further comprises the step of: providing the audio signals to the circuit that processes the audio signals to produce an output audio signal that represents the user's speech according to the result of an analysis step.
[0011]
11. DEVICE (2), characterized in that it comprises: a plurality of audio sensors (4, 6) arranged in the device (2) so that when a first audio sensor (4, 6) of the plurality of audio sensors (4 , 6) is in contact with a user of the device (2), a second audio sensor (4, 6) of the plurality of audio sensors (4, 6) is in contact with air; and circuit (7) which is configured to: obtain respective audio signals representing a user's speech from a plurality of audio sensors (4, 6); and analyzing the respective audio signals to determine which, if any, of the plurality of audio sensors (4, 6) is in contact with the user of the device (2).
[0012]
12. DEVICE (2) according to claim 11, characterized in that the circuit (7) is configured to analyze the power of the respective audio signals above the threshold frequency.
[0013]
DEVICE (2) according to claim 11 or 12, characterized in that the circuit (7) is configured to analyze the respective audio signals: apply a Fourier transform of the point N in each audio signal; determining information about the power spectrum below the threshold frequency for each of the audio signals submitted to the Fourier transform; normalize the audio signals submitted to the Fourier transform of the two sensors against each other according to the determined information; and comparing the power spectrum above the threshold frequency of the normalized Fourier transform audio signals to determine which, if any, of the plurality of audio sensors (4, 6) is in contact with the user of the device (2).
[0014]
DEVICE (2) according to claim 11, 12 or 13, characterized in that it further comprises: processing the circuit (8) for receiving the audio signals and for processing the audio signals in accordance with the production of a signal output audio that represents the user's speech.
类似技术:
公开号 | 公开日 | 专利标题
BR112013012539B1|2021-05-18|method to operate a device and device
JP6034793B2|2016-11-30|Audio signal generation system and method
KR20060044629A|2006-05-16|Isolating speech signals utilizing neural networks
BR112015020150B1|2021-08-17|APPLIANCE TO GENERATE A SPEECH SIGNAL, AND, METHOD TO GENERATE A SPEECH SIGNAL
JP5000647B2|2012-08-15|Multi-sensor voice quality improvement using voice state model
KR20210038871A|2021-04-08|Detection of replay attacks
Peer et al.2008|Reverberation matching for speaker recognition
Maruri et al.2018|V-speech: Noise-robust speech capturing glasses using vibration sensors
Jokinen et al.2016|The Use of Read versus Conversational Lombard Speech in Spectral Tilt Modeling for Intelligibility Enhancement in Near-End Noise Conditions.
JP2007068847A|2007-03-22|Glottal closure region detecting apparatus and method
BR112014009338B1|2021-08-24|NOISE Attenuation APPLIANCE AND NOISE Attenuation METHOD
CN111833896A|2020-10-27|Voice enhancement method, system, device and storage medium for fusing feedback signals
Rahman et al.2010|Pitch characteristics of bone conducted speech
Rahman et al.2019|Multisensory speech enhancement using lower‐frequency components from bone‐conducted speech
JP2016146576A|2016-08-12|Measuring method and measuring tool and correction method of reproduction characteristics of earphone and application program of measurement and application program of correction
Martínez et al.1997|ASR in highly non-stationary environments using adaptive noise canceling techniques
JP2020190606A|2020-11-26|Sound noise removal device and program
Na et al.2018|Noise reduction algorithm with the soft thresholding based on the Shannon entropy and bone-conduction speech cross-correlation bands
Nguyen et al.2009|Selective time-reversal block solution to the stereophonic acoustic echo cancellation problem
KR20100025140A|2010-03-09|Method of voice source separation
Vaziri et al.2019|Evaluating noise suppression methods for recovering the Lombard speech from vocal output in an external noise field
Abu-El-Quran et al.2012|Multiengine speech processing using snr estimator in variable noisy environments
BR112015007625B1|2021-12-21|DEVICE, METHOD OF GENERATION OF AN AUDIO INTERFERENCE MEASURE AND COMPUTER-LEABLE STORAGE MEDIA
Jeub et al.2009|Dereverberation of speech signals based on the discrete model of speech production
BR112015019040B1|2021-12-07|SYSTEMS AND METHODS OF PERFORMING FILTERING TO DETERMINE GAIN
同族专利:
公开号 | 公开日
WO2012069973A1|2012-05-31|
US9538301B2|2017-01-03|
WO2012069973A9|2013-05-10|
EP2643981A1|2013-10-02|
JP6031041B2|2016-11-24|
RU2013128560A|2014-12-27|
RU2605522C2|2016-12-20|
BR112013012539A2|2020-08-04|
JP2014501089A|2014-01-16|
CN103229517B|2017-04-19|
EP2643981B1|2014-09-17|
CN103229517A|2013-07-31|
US20140119548A1|2014-05-01|
引用文献:
公开号 | 申请日 | 公开日 | 申请人 | 专利标题

JPS42962Y1|1965-06-03|1967-01-20|
JPS5836526A|1981-08-25|1983-03-03|Rion Co|Contact microphone|
JPH02962A|1988-05-25|1990-01-05|Mitsubishi Electric Corp|Formation of photomask|
EP0683621B1|1994-05-18|2002-03-27|Nippon Telegraph And Telephone Corporation|Transmitter-receiver having ear-piece type acoustic transducing part|
JPH07312634A|1994-05-18|1995-11-28|Nippon Telegr & Teleph Corp <Ntt>|Transmitter/receiver for using earplug-shaped transducer|
JP3876061B2|1997-10-06|2007-01-31|Necトーキン株式会社|Voice pickup device|
JP2000261530A|1999-03-10|2000-09-22|Nippon Telegr & Teleph Corp <Ntt>|Speech unit|
JP2000354284A|1999-06-10|2000-12-19|Iwatsu Electric Co Ltd|Transmitter-receiver using transmission/reception integrated electro-acoustic transducer|
JP2001224100A|2000-02-14|2001-08-17|Pioneer Electronic Corp|Automatic sound field correction system and sound field correction method|
JP2002125298A|2000-10-13|2002-04-26|Yamaha Corp|Microphone device and earphone microphone device|
US6952672B2|2001-04-25|2005-10-04|International Business Machines Corporation|Audio source position detection and audio adjustment|
KR20030040610A|2001-11-15|2003-05-23|한국전자통신연구원|A method for enhancing speech quality of sound signal inputted from bone conduction microphone|
JP2004279768A|2003-03-17|2004-10-07|Mitsubishi Heavy Ind Ltd|Device and method for estimating air-conducted sound|
US7447630B2|2003-11-26|2008-11-04|Microsoft Corporation|Method and apparatus for multi-sensory speech enhancement|
US7499686B2|2004-02-24|2009-03-03|Microsoft Corporation|Method and apparatus for multi-sensory speech enhancement on a mobile device|
US7283850B2|2004-10-12|2007-10-16|Microsoft Corporation|Method and apparatus for multi-sensory speech enhancement on a mobile device|
JP2006126558A|2004-10-29|2006-05-18|Asahi Kasei Corp|Voice speaker authentication system|
EP1640972A1|2005-12-23|2006-03-29|Phonak AG|System and method for separation of a users voice from ambient sound|
US8214219B2|2006-09-15|2012-07-03|Volkswagen Of America, Inc.|Speech communications system for a vehicle and method of operating a speech communications system for a vehicle|
CN101150883A|2006-09-20|2008-03-26|南京Lg同创彩色显示系统有限责任公司|Audio output device of display|
JP5075676B2|2008-02-28|2012-11-21|株式会社オーディオテクニカ|Microphone|
US8675884B2|2008-05-22|2014-03-18|DSP Group|Method and a system for processing signals|
JP5256119B2|2008-05-27|2013-08-07|パナソニック株式会社|Hearing aid, hearing aid processing method and integrated circuit used for hearing aid|
CN101645697B|2008-08-07|2011-08-10|英业达股份有限公司|System and method for controlling sound volume|
US20100224191A1|2009-03-06|2010-09-09|Cardinal Health 207, Inc.|Automated Oxygen Delivery System|
EP2458586A1|2010-11-24|2012-05-30|Koninklijke Philips Electronics N.V.|System and method for producing an audio signal|US7148879B2|2000-07-06|2006-12-12|At&T Corp.|Bioacoustic control system, method and apparatus|
EP2458586A1|2010-11-24|2012-05-30|Koninklijke Philips Electronics N.V.|System and method for producing an audio signal|
CN103890843B|2011-10-19|2017-01-18|皇家飞利浦有限公司|Signal noise attenuation|
US8908894B2|2011-12-01|2014-12-09|At&T Intellectual Property I, L.P.|Devices and methods for transferring data through a human body|
JP6580990B2|2012-10-09|2019-09-25|聯發科技股▲ふん▼有限公司Mediatek Inc.|Method and apparatus for audio interference estimation|
US9595271B2|2013-06-27|2017-03-14|Getgo, Inc.|Computer system employing speech recognition for detection of non-speech audio|
US10108984B2|2013-10-29|2018-10-23|At&T Intellectual Property I, L.P.|Detecting body language via bone conduction|
US9594433B2|2013-11-05|2017-03-14|At&T Intellectual Property I, L.P.|Gesture-based controls via bone conduction|
US9349280B2|2013-11-18|2016-05-24|At&T Intellectual Property I, L.P.|Disrupting bone conduction signals|
US9715774B2|2013-11-19|2017-07-25|At&T Intellectual Property I, L.P.|Authenticating a user on behalf of another user based upon a unique body signature determined through bone conduction signals|
US9405892B2|2013-11-26|2016-08-02|At&T Intellectual Property I, L.P.|Preventing spoofing attacks for bone conduction applications|
US9582071B2|2014-09-10|2017-02-28|At&T Intellectual Property I, L.P.|Device hold determination using bone conduction|
US10045732B2|2014-09-10|2018-08-14|At&T Intellectual Property I, L.P.|Measuring muscle exertion using bone conduction|
US9589482B2|2014-09-10|2017-03-07|At&T Intellectual Property I, L.P.|Bone conduction tags|
US9882992B2|2014-09-10|2018-01-30|At&T Intellectual Property I, L.P.|Data session handoff using bone conduction|
US9600079B2|2014-10-15|2017-03-21|At&T Intellectual Property I, L.P.|Surface determination via bone conduction|
EP3211918B1|2014-10-20|2021-08-25|Sony Group Corporation|Voice processing system|
US10431240B2|2015-01-23|2019-10-01|Samsung Electronics Co., Ltd|Speech enhancement method and system|
GB201615538D0|2016-09-13|2016-10-26|Nokia Technologies Oy|A method , apparatus and computer program for processing audio signals|
GB201713946D0|2017-06-16|2017-10-18|Cirrus Logic Int Semiconductor Ltd|Earbud speech estimation|
US10831316B2|2018-07-26|2020-11-10|At&T Intellectual Property I, L.P.|Surface interface|
法律状态:
2020-08-18| B06F| Objections, documents and/or translations needed after an examination request according [chapter 6.6 patent gazette]|
2020-09-01| B25D| Requested change of name of applicant approved|Owner name: KONINKLIJKE PHILIPS N.V. (NL) |
2020-09-08| B06U| Preliminary requirement: requests with searches performed by other patent offices: procedure suspended [chapter 6.21 patent gazette]|
2020-09-24| B25G| Requested change of headquarter approved|Owner name: KONINKLIJKE PHILIPS N.V. (NL) |
2021-03-02| B09A| Decision: intention to grant [chapter 9.1 patent gazette]|
2021-05-18| B16A| Patent or certificate of addition of invention granted|Free format text: PRAZO DE VALIDADE: 20 (VINTE) ANOS CONTADOS A PARTIR DE 21/11/2011, OBSERVADAS AS CONDICOES LEGAIS. |
优先权:
申请号 | 申请日 | 专利标题
EP10192400|2010-11-24|
EP10192400.9|2010-11-24|
PCT/IB2011/055198|WO2012069973A1|2010-11-24|2011-11-21|A device comprising a plurality of audio sensors and a method of operating the same|
[返回顶部]